Speaker-dependent multipitch tracking using deep neural networks
نویسندگان
چکیده
Multipitch tracking is important for speech and signal processing. However, it is challenging to design an algorithm that achieves accurate pitch estimation and correct speaker assignment at the same time. In this paper, deep neural networks (DNNs) are used to model the probabilistic pitch states of two simultaneous speakers. To capture speaker-dependent information, two types of DNN with different training strategies are proposed. The first is trained for each speaker enrolled in the system (speaker-dependent DNN), and the second is trained for each speaker pair (speaker-pair-dependent DNN). Several extensions, including gender-pair-dependent DNNs, speaker adaptation of gender-pair-dependent DNNs and training with multiple energy ratios, are introduced later to relax constraints. A factorial hidden Markov model (FHMM) then integrates pitch probabilities and generates the most likely pitch tracks with a junction tree algorithm. Experiments show that the proposed methods substantially outperform other speaker-independent and speaker-dependent multipitch trackers on two-speaker mixtures. With multi-ratio training, the proposed methods achieve consistent performance at various energies ratios of the two speakers in a mixture.
منابع مشابه
شبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملThe Use of Wavelets in Speaker Feature Tracking Identification System Using Neural Network
Continuous and Discrete Wavelet Transform (WT) are used to create text-dependent robust to noise speaker recognition system. In this paper we investigate the accuracy of identification the speaker identity in nonstationary signals. Three methods are used to extract the essential speaker features based on Continuous, Discrete Wavelet Transform and Power Spectrum Density (PSD). To have better ide...
متن کاملVoice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines
This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build highorder eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speakerdependent RBMs with neural networks, expecting ...
متن کاملEmbedding-Based Speaker Adaptive Training of Deep Neural Networks
An embedding-based speaker adaptive training (SAT) approach is proposed and investigated in this paper for deep neural network acoustic modeling. In this approach, speaker embedding vectors, which are a constant given a particular speaker, are mapped through a control network to layer-dependent elementwise affine transformations to canonicalize the internal feature representations at the output...
متن کاملEM-Based Gain Adaptation for Probabilistic Multipitch Tracking
We introduce an EM algorithm for automatic speaker gain adaptation, and use this approach for probabilistic multipitch tracking. We derive a lower bound on the log-likelihood of the gain parameters and use a fast pruning method to make lower bound optimization efficient. We evaluate the performance of gain adapted multipitch tracking on the GRID database, where 3000 speech mixtures were generat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- The Journal of the Acoustical Society of America
دوره 141 2 شماره
صفحات -
تاریخ انتشار 2015